The MI plot (fig) of the Starr-seq reads (40+40bp surrounding TSS) shows clear signals at the diagonal, together with additional long-distance cooperations. However, the underlying binding specificities that give rise to the signal is unclear.
To address this, kmer were counted for all position-pairs with considerable MI signal, the kmer ranks were then used for clustering. The
PCA result (fig) shows that there are 3 distict clusters.
When mapped back
(the following fig), we can see that both cluster 2 and 3 are from the long-distance MI signals downstream of TSS
To get a fast insight, an
xyplot between the 3 clusters is generated. The results likely suggest that Cluster1 consists of a mixture of many weak specificities, while cluster 2 and 3 consist of stronger specificities. An examination on the preferred kmers in cluster 2 and 3 indicate that
cluster 2 and 3 indicates the same binding event, as the kmers there are a frameshift